Approximate data mining in very large relational data

نویسندگان

  • James C. Bezdek
  • Richard J. Hathaway
  • Christopher Leckie
  • Kotagiri Ramamohanarao
چکیده

In this paper we discuss eNERF, an extended version of non-Euclidean relational fuzzy c-means (NERFCM) for approximate clustering in very large (unloadable) relational data. The eNERF procedure consists of four parts: (i) selection of distinguished features by algorithm DF to be monitored during progressive sampling; (ii) progressively sampling a square N × N relation matrix RN by algorithm PS until an n × n sample relation Rn passes a goodness of fit test; (iii) Clustering Rn using algorithm LNERF; and (iv), extension of the LNERF results to RN-Rn by algorithm xNERF, which uses an iterative procedure based on LNERF to compute fuzzy membership values for all of the objects remaining after LNERF clustering of the accepted sample. Three of the four algorithms are new only LNERF (called NERFCM in the original literature) precedes this article.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate resistivity and susceptibility mapping from airborne electromagnetic and magnetic data, a case study for a geologically plausible porphyry copper unit in Iran

This paper describes the application of approximate methods to invert airborne magnetic data as well as helicopter-borne frequency domain electromagnetic data in order to retrieve a joint model of magnetic susceptibility and electrical resistivity. The study area located in Semnan province of Iran consists of an arc-shaped porphyry andesite covered by sedimentary units which may have potential ...

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes

Data Mining (DM) represents the process of extracting interesting and previously unknown knowledge from data. Approximate Functional Dependencies (AFD) mined from database relations represent potentially interesting patterns and have proven to be useful for various tasks like feature selection for classification, query optimization and query rewriting. The discovery of AFDs still remains under ...

متن کامل

Mining Approximate Keys Based on Reasoning from XML Data

Keys are very important for data management. Due to the hierarchical structure and syntactic flexibility of XML, mining keys from XML data is a more complex and difficult task than from relational databases. In discovering keys from XML data there are some challenges in practice such as unclearness of keys, storage of enormous keys, efficient mining algorithms, etc. In this paper, in order to f...

متن کامل

A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets

The amount of data produced by ubiquitous computing applications is quickly growing, due to the pervasive presence of small devices endowed with sensing, computing and communication capabilities. Heterogeneity and strong interdependence, which characterize ‘ubiquitous data’, require a (multi-)relational approach to their analysis. However, relational data mining algorithms do not scale well and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006